Soft Discretization to Enhance the Continuous Decision Tree Induction*

نویسندگان

Yonghong Peng

Peter A Flach

چکیده

Decision tree induction has been widely used to generate classifiers from training data through a process of recursively splitting the data space. In the case of training on continuous-valued data, the associated attributes must be discretized in advance or during the learning process. The commonly used method is to partition the attribute range into two or several intervals using a single or a set of cut points. One inherent disadvantage in these methods is that the use of sharp (crisp) cut points makes the induced decision trees sensitive to noise. To overcome this problem this paper presents an alternative method, called soft discretization, based on fuzzy set theory. As opposed to a classical decision tree, which gives only one class as the end result, the soft discretization based decision tree associates a set of possibilities to several or all classes for an unknown object. As a result, even if uncertainties existed in the object, the decision tree would not give a completely wrong result, but a set of possibility values. This approach has been successfully applied to an industrial problem to monitor a typical machining process. Experimental results showed that, by using soft discretization, better classification accuracy has been obtained in both training and testing than classical decision tree, which suggest that the robustness of decision trees could be improved by means of soft discretization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Exploring Soft Discretization of Continuous Attributes

Searching for a binary partition of attribute domains is an important task in data mining. It is present in both decision tree construction and discretization. The most important advantages of decision tree methods are compactness and clearness of knowledge representation as well as high accuracy of classification. Decision tree algorithms also have some drawbacks. In cases of large data tables...

متن کامل

A Decision Boundary based Discretization Technique using Resampling

Many supervised induction algorithms require discrete data, even while real data often comes in a discrete and continuous formats. Quality discretization of continuous attributes is an important problem that has effects on speed, accuracy and understandability of the induction models. Usually, discretization and other types of statistical processes are applied to subsets of the population as th...

متن کامل

A Soft Decision Tree

Searching for binary partition of attribute domains is an important task in Data Mining, particularly in decision tree methods. The most important advantage of decision tree methods are based on compactness and clearness of presented knowledge and high accuracy of classification. In case of large data tables, the existing decision tree induction methods often show to be inefficient in both comp...

متن کامل

Some Enhencements of Decision Tree Bagging

This paper investigates enhancements of decision tree bagging which mainly aim at improving computation times, but also accuracy. The three questions which are reconsidered are: discretization of continuous attributes, tree pruning, and sampling schemes. A very simple discretization procedure is proposed, resulting in a dramatic speedup without signiicant decrease in accuracy. Then a new method...

متن کامل

Investigation and Reduction of Discretization Variance in Decision Tree Induction

This paper focuses on the variance introduced by the dis-cretization techniques used to handle continuous attributes in decision tree induction. Diierent discretization procedures are rst studied empirically , then means to reduce the discretization variance are proposed. The experiment shows that discretization variance is large and that it is possible to reduce it signiicantly without notable...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Soft Discretization to Enhance the Continuous Decision Tree Induction*

نویسندگان

چکیده

منابع مشابه

On Exploring Soft Discretization of Continuous Attributes

A Decision Boundary based Discretization Technique using Resampling

A Soft Decision Tree

Some Enhencements of Decision Tree Bagging

Investigation and Reduction of Discretization Variance in Decision Tree Induction

عنوان ژورنال:

اشتراک گذاری